28 research outputs found

    Time series data mining: preprocessing, analysis, segmentation and prediction. Applications

    Get PDF
    Currently, the amount of data which is produced for any information system is increasing exponentially. This motivates the development of automatic techniques to process and mine these data correctly. Specifically, in this Thesis, we tackled these problems for time series data, that is, temporal data which is collected chronologically. This kind of data can be found in many fields of science, such as palaeoclimatology, hydrology, financial problems, etc. TSDM consists of several tasks which try to achieve different objectives, such as, classification, segmentation, clustering, prediction, analysis, etc. However, in this Thesis, we focus on time series preprocessing, segmentation and prediction. Time series preprocessing is a prerequisite for other posterior tasks: for example, the reconstruction of missing values in incomplete parts of time series can be essential for clustering them. In this Thesis, we tackled the problem of massive missing data reconstruction in SWH time series from the Gulf of Alaska. It is very common that buoys stop working for different periods, what it is usually related to malfunctioning or bad weather conditions. The relation of the time series of each buoy is analysed and exploited to reconstruct the whole missing time series. In this context, EANNs with PUs are trained, showing that the resulting models are simple and able to recover these values with high precision. In the case of time series segmentation, the procedure consists in dividing the time series into different subsequences to achieve different purposes. This segmentation can be done trying to find useful patterns in the time series. In this Thesis, we have developed novel bioinspired algorithms in this context. For instance, for paleoclimate data, an initial genetic algorithm was proposed to discover early warning signals of TPs, whose detection was supported by expert opinions. However, given that the expert had to individually evaluate every solution given by the algorithm, the evaluation of the results was very tedious. This led to an improvement in the body of the GA to evaluate the procedure automatically. For significant wave height time series, the objective was the detection of groups which contains extreme waves, i.e. those which are relatively large with respect other waves close in time. The main motivation is to design alert systems. This was done using an HA, where an LS process was included by using a likelihood-based segmentation, assuming that the points follow a beta distribution. Finally, the analysis of similarities in different periods of European stock markets was also tackled with the aim of evaluating the influence of different markets in Europe. When segmenting time series with the aim of reducing the number of points, different techniques have been proposed. However, it is an open challenge given the difficulty to operate with large amounts of data in different applications. In this work, we propose a novel statistically-driven CRO algorithm (SCRO), which automatically adapts its parameters during the evolution, taking into account the statistical distribution of the population fitness. This algorithm improves the state-of-the-art with respect to accuracy and robustness. Also, this problem has been tackled using an improvement of the BBPSO algorithm, which includes a dynamical update of the cognitive and social components in the evolution, combined with mathematical tricks to obtain the fitness of the solutions, which significantly reduces the computational cost of previously proposed coral reef methods. Also, the optimisation of both objectives (clustering quality and approximation quality), which are in conflict, could be an interesting open challenge, which will be tackled in this Thesis. For that, an MOEA for time series segmentation is developed, improving the clustering quality of the solutions and their approximation. The prediction in time series is the estimation of future values by observing and studying the previous ones. In this context, we solve this task by applying prediction over high-order representations of the elements of the time series, i.e. the segments obtained by time series segmentation. This is applied to two challenging problems, i.e. the prediction of extreme wave height and fog prediction. On the one hand, the number of extreme values in SWH time series is less with respect to the number of standard values. In this way, the prediction of these values cannot be done using standard algorithms without taking into account the imbalanced ratio of the dataset. For that, an algorithm that automatically finds the set of segments and then applies EANNs is developed, showing the high ability of the algorithm to detect and predict these special events. On the other hand, fog prediction is affected by the same problem, that is, the number of fog events is much lower tan that of non-fog events, requiring a special treatment too. A preprocessing of different data coming from sensors situated in different parts of the Valladolid airport are used for making a simple ANN model, which is physically corroborated and discussed. The last challenge which opens new horizons is the estimation of the statistical distribution of time series to guide different methodologies. For this, the estimation of a mixed distribution for SWH time series is then used for fixing the threshold of POT approaches. Also, the determination of the fittest distribution for the time series is used for discretising it and making a prediction which treats the problem as ordinal classification. The work developed in this Thesis is supported by twelve papers in international journals, seven papers in international conferences, and four papers in national conferences

    Segmentación de series temporales mediante un algoritmo multiobjetivo evolutivo

    Get PDF
    Premio extraordinario de Trabajo Fin de Máster curso 2015-2016. Ingeniería Informátic

    A multi-class classification model with parametrized target outputs for randomized-based feedforward neural networks

    Get PDF
    Randomized-based Feedforward Neural Networks approach regression and classification (binary and multi-class) problems by minimizing the same optimization problem. Specifically, the model parameters are determined through the ridge regression estimator of the patterns projected in the hidden layer space (randomly generated in its neural network version) for models without direct links and the patterns projected in the hidden layer space along with the original input data for models with direct links. The targets are encoded for the multi-class classification problem according to the 1- of-J encoding (J the number of classes), which implies that the model parameters are estimated to project all the patterns belonging to its corresponding class to one and the remaining to zero. This approach has several drawbacks, which motivated us to propose an alternative optimization model for the framework. In the proposed optimization model, model parameters are estimated for each class so that their patterns are projected to a reference point (also optimized during the process), whereas the remaining patterns (not belonging to that class) are projected as far away as possible from the reference point. The final problem is finally presented as a generalized eigenvalue problem. Four models are then presented: the neural network version of the algorithm and its corresponding kernel version for the neural networks models with and without direct links. In addition, the optimization model has also been implemented in randomization-based multi-layer or deep neural networks. The empirical results obtained by the proposed models were compared to those reported by state-ofthe-art models in the correct classification rate and a separability index (which measures the degree of separability in projection terms per class of the patterns belonging to the class of the others). The proposed methods show very competitive performance in the separability index and prediction accuracy compared to the neural networks version of the comparison methods (with and without direct links). Remarkably, the model provides significantly superior performance in deep models with direct links compared to its deep model counterpart

    A mixed distribution to fix the threshold for Peak-Over-Threshold wave height estimation

    Get PDF
    Modelling extreme values distributions, such as wave height time series where the higher waves are much less frequent than the lower ones, has been tackled from the point of view of the Peak-OverThreshold (POT) methodologies, where modelling is based on those values higher than a threshold. This threshold is usually predefned by the user, while the rest of values are ignored. In this paper, we propose a new method to estimate the distribution of the complete time series, including both extreme and regular values. This methodology assumes that extreme values time series can be modelled by a normal distribution in a combination of a uniform one. The resulting theoretical distribution is then used to fx the threshold for the POT methodology. The methodology is tested in nine real-world time series collected in the Gulf of Alaska, Puerto Rico and Gibraltar (Spain), which are provided by the National Data Buoy Center (USA) and Puertos del Estado (Spain). By using the Kolmogorov-Smirnov statistical test, the results confrm that the time series can be modelled with this type of mixed distribution. Based on this, the return values and the confdence intervals for wave height in diferent periods of time are also calculated

    An Evolutionary Artificial Neural Network approach for spatio-temporal wave height time series reconstruction

    Get PDF
    This paper proposes a novel methodology for recovering missing time series data, a crucial task for subsequent Machine Learning (ML) analyses. The methodology is specifically applied to Significant Wave Height (SWH) time series in the field of marine engineering. The proposed approach involves two phases. Firstly, the SWH time series for each buoy is independently reconstructed using three transfer function models: regression-based, correlation-based, and distance-based. The distance-based transfer function exhibits the best overall performance. Secondly, Evolutionary Artificial Neural Networks (EANNs) are utilised for the final recovery of each time series, using as inputs highly correlated buoys that have been intermediately recovered. The EANNs are evolved considering two metrics, the novel squared error relevance area, which balances the importance of extreme and around-mean values, and the well-known mean squared error. The study considers SWH time series data from 15 buoys in two coastal zones in the United States. The results demonstrate that the distance-based transfer function is generally the best transfer function, and that EANNs outperform a range of state-of-the-art ML techniques in 12 out of the 15 buoys, with a number of connections comparable to linear models. Furthermore, the proposed methodology outperforms the two most popular approaches for time series reconstruction, BRITS and SAITS, for all buoys except one. Therefore, the proposed methodology provides a promising approach, which may be applied to time series from other fields, such as wind or solar energy farms in the field of green energy

    Hybridization of neural network models for the prediction of Extreme Significant Wave Height segments

    Get PDF
    This work proposes a hybrid methodology for the detection and prediction of Extreme Significant Wave Height (ESWH) periods in oceans. In a first step, wave height time series is approximated by a labeled sequence of segments, which is obtained using a genetic algorithm in combination with a likelihood-based segmentation (GA+LS). Then, an artificial neural network classifier with hybrid basis functions is trained with a multiobjetive evolutionary algorithm (MOEA) in order to predict the occurrence of future ESWH segments based on past values. The methodology is applied to a buoy in the Gulf of Alaska and another one in Puerto Rico. The results show that the GA+LS is able to segment and group the ESWH values, and the neural network models, obtained by the MOEA, make good predictions maintaining a balance between global accuracy and minimum sensitivity for the detection of ESWH events. Moreover, hybrid neural networks are shown to lead to better results than pure models

    A multi-class classification model with parametrized target outputs for randomized-based feedforward neural networks.

    Get PDF
    Randomized-based Feedforward Neural Networks approach regression and classification (binary and multi-class) problems by minimizing the same optimization problem. Specifically, the model parameters are determined through the ridge regression estimator of the patterns projected in the hidden layer space (randomly generated in its neural network version) for models without direct links and the patterns projected in the hidden layer space along with the original input data for models with direct links. The targets are encoded for the multi-class classification problem according to the 1-of- encoding ( the number of classes), which implies that the model parameters are estimated to project all the patterns belonging to its corresponding class to one and the remaining to zero. This approach has several drawbacks, which motivated us to propose an alternative optimization model for the framework. In the proposed optimization model, model parameters are estimated for each class so that their patterns are projected to a reference point (also optimized during the process), whereas the remaining patterns (not belonging to that class) are projected as far away as possible from the reference point. The final problem is finally presented as a generalized eigenvalue problem. Four models are then presented: the neural network version of the algorithm and its corresponding kernel version for the neural networks models with and without direct links. In addition, the optimization model has also been implemented in randomization-based multi-layer or deep neural networks.Funding for open access charge: Universidad de Málaga / CBU

    Gamifying the Classroom for the Acquisition of Skills Associated with Machine Learning: A Two-Year Case Study

    Get PDF
    Machine learning (ML) is the field of science that combines knowledge from artificial intelligence, statistics and mathematics intending to give computers the ability to learn from data without being explicitly programmed to do so. It falls under the umbrella of Data Science and is usually developed by Computer Engineers becoming what is known as Data Scientists. Developing the necessary competences in this field is not a trivial task, and applying innovative methodologies such as gamification can smooth the initial learning curve. In this context, communities offering platforms for open competitions such as Kaggle can be used as a motivating element. The main objective of this work is to gamify the classroom with the idea of providing students with valuable hands-on experience by means of addressing a real problem, as well as the possibility to cooperate and compete simultaneously to acquire ML competences. The innovative teaching experience carried out during two years meant a great motivation, an improvement of the learning capacity and a continuous recycling of knowledge to which Computer Engineers are faced to

    REX-001, a BM-MNC Enriched Solution, Induces Revascularization of Ischemic Tissues in a Murine Model of Chronic Limb-Threatening Ischemia

    Get PDF
    Background: Bone Marrow Mononuclear Cells (BM-MNC) constitute a promising alternative for the treatment of Chronic Limb-Threatening ischemia (CLTI), a disease characterized by extensive blockade of peripheral arteries, clinically presenting as excruciating pain at rest and ischemic ulcers which may lead to gangrene and amputation. BM-MNC implantation has shown to be efficient in promoting angiogenesis and ameliorating ischemic symptoms in CLTI patients. However, the variability seen between clinical trials makes necessary a further understanding of the mechanisms of action of BM-MNC, and moreover, to improve trial characteristics such as endpoints, inclusion/exclusion criteria or drug product compositions, in order to implement their use as stem-cell therapy. Materials: Herein, the effect of REX-001, a human-BM derived cell suspension enriched for mononuclear cells, granulocytes and CD34+ cells, has been assessed in a murine model of CLTI. In addition, a REX-001 placebo solution containing BM-derived red blood cells (BM-RBCs) was also tested. Thus, 24 h after double ligation of the femoral artery, REX-001 and placebo were administrated intramuscularly to Balb-c nude mice (n:51) and follow-up of ischemic symptoms (blood flow perfusion, motility, ulceration and necrosis) was carried out for 21 days. The number of vessels and vascular diameter sizes were measured within the ischemic tissues to evaluate neovascularization and arteriogenesis. Finally, several cell-tracking assays were performed to evaluate potential biodistribution of these cells. Results: REX-001 induced a significant recovery of blood flow by increasing vascular density within the ischemic limbs, with no cell translocation to other organs. Moreover, cell tracking assays confirmed a decrease in the number of infused cells after 2 weeks post-injection despite on-going revascularization, suggesting a paracrine mechanism of action. Conclusion: Overall, our data supported the role of REX-001 product to improve revascularization and ischemic reperfusion in CLTI
    corecore